AITopics

2605.03493

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Industry:

Information Technology (0.68)
Media > Film (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Education > Educational Setting (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Kocák, Tomáš, Neu, Gergely, Valko, Michal

Online learning with Erdős-Rényi side-observation graphs

arXiv.org Machine LearningApr-29-2026

We consider adversarial multi-armed bandit problems where the learner is allowed to observe losses of a number of arms beside the arm that it actually chose. We study the case where all non-chosen arms reveal their loss with a fixed but unknown probability $r$, independently of each other and the action of the learner. We propose two algorithms that work for different ranges of $r$. We show that after $T$ rounds in a bandit problem with $N$ arms, the expected regret of our first algorithm is $O(\sqrt{(T /r) \log N })$ whenever $r\ge(\log T)/(2N)$, while our second algorithm achieves a regret of $O(\sqrt{(T/r) \log (N+T)})$ for smaller values of $r$. We also give a quick estimation procedure that decides the range of~$r$. All our bounds are within logarithmic factors of the best achievable performance of any algorithm that is even allowed to know~$r$.

artificial intelligence, data mining, machine learning, (20 more...)

2604.25271

Country: Europe (0.68)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Online (0.51)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningApr-28-2026

Efficient learning by implicit exploration in bandit problems with side observations

Kocak, Tomas, Neu, Gergely, Valko, Michal, Munos, Remi

We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition to its own loss, the learner also gets to observe losses of some other actions. The revealed losses depend on the learner's action and a directed observation system chosen by the environment. For this setting, we propose the first algorithm that enjoys near-optimal regret guarantees without having to know the observation system before selecting its actions. Along similar lines, we also define a new partial information setting that models online combinatorial optimization problems where the feedback received by the learner is between semi-bandit and full feedback. As the predictions of our first algorithm cannot be always computed efficiently in this setting, we propose another algorithm with similar properties and with the benefit of always being computationally efficient, at the price of a slightly more complicated tuning mechanism. Both algorithms rely on a novel exploration strategy called implicit exploration, which is shown to be more efficient both computationally and information-theoretically than previously studied exploration strategies for the problem.

artificial intelligence, data mining, machine learning, (20 more...)

2604.24555

Country: Europe (0.46)

Genre: Research Report (0.40)

Industry: Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Kocák, Tomáš, Neu, Gergely, Valko, Michal

Online learning with noisy side observations

arXiv.org Machine LearningApr-16-2026

We propose a new partial-observability model for online learning problems where the learner, besides its own loss, also observes some noisy feedback about the other actions, depending on the underlying structure of the problem. We represent this structure by a weighted directed graph, where the edge weights are related to the quality of the feedback shared by the connected nodes. Our main contribution is an efficient algorithm that guarantees a regret of $\widetilde{O}(\sqrt{α^* T})$ after $T$ rounds, where $α^*$ is a novel graph property that we call the effective independence number. Our algorithm is completely parameter-free and does not require knowledge (or even estimation) of $α^*$. For the special case of binary edge weights, our setting reduces to the partial-observability models of Mannor and Shamir (2011) and Alon et al. (2013) and our algorithm recovers the near-optimal regret bounds.

artificial intelligence, data mining, machine learning, (20 more...)

2604.1374

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)

Genre: Research Report (0.50)

Industry: Education > Educational Setting > Online (0.72)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.72)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Neural Information Processing SystemsFeb-10-2026, 05:06:32 GMT

first study of this problem, a tabular assumption is a natural starting point that already posed several new challenges

We will include a list of tasks with more details in the paper.

artificial intelligence, feedback graph, tabular assumption, (14 more...)

Technology: Information Technology > Artificial Intelligence (0.76)

Yifan Wu, András György, Csaba Szepesvari

Online Learning with Gaussian Payoffs and Side Observations

Neural Information Processing SystemsOct-2-2025, 10:47:43 GMT

We consider a sequential learning problem with Gaussian payoffs and side observations: after selecting an action i, the learner receives information about the payoff of every action j in the form of Gaussian observations whose mean is the same as the mean payoff, but the variance depends on the pair (i,j) (and may be infinite). The setup allows a more refined information transfer from one action to another than previous partial monitoring setups, including the recently introduced graph-structured feedback case. For the first time in the literature, we provide non-asymptotic problem-dependent lower bounds on the regret of any algorithm, which recover existing asymptotic problem-dependent lower bounds and finite-time minimax lower bounds available in the literature. We also provide algorithms that achieve the problem-dependent lower bound (up to some universal constant factor) or the minimax lower bounds (up to logarithmic factors).

algorithm, payoff, side observation, (14 more...)

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Education > Educational Setting > Online (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.57)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.41)

Neural Information Processing SystemsSep-30-2025, 09:41:02 GMT

Efficient learning by implicit exploration in bandit problems with side observations

We consider online learning problems under a a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition to its own loss, the learner also gets to observe losses of some other actions. The revealed losses depend on the learner's action and a directed observation system chosen by the environment. For this setting, we propose the first algorithm that enjoys near-optimal regret guarantees without having to know the observation system before selecting its actions. Along similar lines, we also define a new partial information setting that models online combinatorial optimization problems where the feedback received by the learner is between semi-bandit and full feedback. As the predictions of our first algorithm cannot be always computed efficiently in this setting, we propose another algorithm with similar properties and with the benefit of always being computationally efficient, at the price of a slightly more complicated tuning mechanism. Both algorithms rely on a novel exploration strategy called implicit exploration, which is shown to be more efficient both computationally and information-theoretically than previously studied exploration strategies for the problem.

bandit problem, implicit exploration, name change, (9 more...)

Industry: Education (0.60)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.60)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Raman Arora, Teodor Vanislavov Marinov, Mehryar Mohri

Bandits with Feedback Graphs and Switching Costs

Neural Information Processing SystemsAug-20-2025, 03:29:35 GMT

We give two new algorithms for this problem in the informed setting.

algorithm, feedback graph, graph, (16 more...)

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > Canada (0.04)

Industry: Education > Educational Setting > Online (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.47)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.30)

Neural Information Processing SystemsAug-16-2025, 07:09:06 GMT

first study of this problem, a tabular assumption is a natural starting point that already posed several new challenges

We will include a list of tasks with more details in the paper.

application, feedback graph, tabular assumption, (13 more...)

Technology: Information Technology > Artificial Intelligence (0.76)

Neural Information Processing SystemsMay-31-2025, 18:32:38 GMT

Review for NeurIPS paper: Reinforcement Learning with Feedback Graphs

Additional Feedback: This paper addresses the problem of an RL agent that receives additional observations, after executing every action, which provide it with information about possible transitions that it could have experienced. These side observations might be generated, for instance, by auxiliary sensors. The authors formalize this setting by can defining a feedback graph based on the additional observations. Feedback graphs may be used by model-based RL algorithms to learn more efficiently. In particular, the authors show that the regret of the resulting model-based algorithm is bounded by certain properties of the graph, instead of depending on the number of states and actions that exist in the original problem (without side observations).

additional observation, algorithm, reinforcement learning, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)